A Dynamic Adaptive Self-Organising Hybrid Model for Text Clustering
نویسندگان
چکیده
Clustering by document concepts is a powerful way of retrieving information from a large number of documents. This task in general does not make any assumption on the data distribution. In this paper, for this task we propose a new competitive Self-Organising (SOM) model, namely the Dynamic Adaptive Self-Organising Hybrid model (DASH). The features of DASH are a dynamic structure, hierarchical clustering, non-stationary data learning and parameter self-adjustment. All features are data-oriented: DASH adjusts its behaviour not only by modifying its parameters but also by an adaptive structure. The hierarchical growing architecture is a useful facility for such a competitive neural model which is designed for text clustering. In this paper, we have presented a new type of self-organising dynamic growing neural network which can deal with the non-uniform data distribution and the non-stationary data sets and represent the inner data structure by a hierarchical view.
منابع مشابه
A Self-Organising Hybrid Model for Dynamic Text Clustering
A text clustering neural model, traditionally, is assumed to cluster static text information and represent its inner structure on a flat map. However, the quantity of text information is continuously growing and the relationships between them are usually complicated. Therefore, the information is not static and a flat map may be not enough to describe the relationships of input data. In this pa...
متن کاملA novel self-organising clustering model for time-event documents
Purpose Neural document clustering techniques, e.g., self-organising map (SOM) or growing neural gas (GNG), usually assume that textual information is stationary on the quantity. However, the quantity of text is ever-increasing. We propose a novel dynamic adaptive self-organising hybrid (DASH) model, which adapts to time-event news collections not only to the neural topological structure but al...
متن کاملNeural Network Based Document Clustering Using WordNet Ontologies
Three novel text vector representation approaches for neural network based document clustering are proposed. The first is the extended significance vector model (ESVM), the second is the hypernym significance vector model (HSVM) and the last is the hybrid vector space model (HyM). ESVM extracts the relationship between words and their preferred classified labels. HSVM exploits a semantic relati...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملOn Document Classification with Self-Organising Maps
This research deals with the use of self-organising maps for the classification of text documents. The aim was to classify documents to separate classes according to their topics. We therefore constructed self-organising maps that were effective for this task and tested them with German newspaper documents. We compared the results gained to those of k nearest neighbour searching and k-means clu...
متن کامل